As of December 2013, it is abundantly clear that the Affordable Care Act (ACA) is a remarkably contentious issue from a political standpoint. As with any political controversy, the debate surrounding this effort at health care reform is heavily distorted by misleading information, which leads many to debate fictitious, irrelevant, or less than significant aspects of the law. To quote the venerable President Fowler from Sum of All Fears (yes, my family and I just watched this movie over the Christmas break), there is "too much bullshit, and not enough facts."
What follows seeks to provide a modest primer on the ACA (a.k.a. Obamacare), fortified by data. Inevitably, it will be colored by the materials to which I have been exposed, but I am open to addendum or amendment should the need arise. In particular, the following components will be addressed:
My motivation for writing this script admittedly extends beyond just collecting these thoughts in one place. Given the need to integrate spatially-referenced categorical and interval data, this Notebook provides an opportunity to work with some very exciting libraries in Python.
This Notebook is intended to serve those who care only about the content and those who want to see the nuts an bolts of working with the data. Consequently, commentary about coding choices and approaches will be entirely contained within the code cells. Doing so permits the creation of a document that hides all coding elements.
'''The first step is loading in the toolset environment. We will also set some globale parameters'''
#Load relevant libraries
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import geopandas as gp
import vincent as vt
from vincent import *
import pandas.io.data as web
from IPython.display import HTML
import pysal as ps
import seaborn
#Set print width
pd.set_option('line_width',100)
This primer draws upon four primary sources that the author feels capture a great deal of useful information about the subject. This set of sources is not intended to be exhaustive (remember, this is a limited exploration of the subject), nor will it capture all aspects of this debate. This set is, however, achievably digested in a relatively short period of time. Consequently, it is easier for one to check on the elements discussed in this Notebook. Furthermore, the last two papers are reflect very recent thinking on the subject, over three and a half years after the passage of the ACA. The sources are as follows:
The data displayed in figures below will be taken largely from three sources:
To understand the ACA, one must have a basic idea of the insurance concept. Insurance is a means of transferring risk from beneficiaries (e.g. individuals like you or I) to insurance providers (e.g. Blue Cross Blue Shield). In a very basic sense, a pure, profitless insurance transaction relies upon the concept of expected value to match the value of a stream of "small" recurrent payments to the probabalistic cost of a "big" payment.
\[\sum^t s_t = E(B) = \int_{-\infty}^{\infty}b f(b) db = \sum^i p_i b_i\]
where \(s_t\) = small payment at time \(t\) and \(b_i\) = big payment associated with event \(i\).
Let's unpack this a bit to see what's happening. The small payment term, \(\sum^t s_t\), represents the periodic payments that an individual would make to an insurance provider. They are called insurance premia. The payment periods are plan-specific and may be monthly, quarterly, etc.
The expected loss term, \(E(B)\), is the expectation of costs due to big events (e.g. heart attack) that occur during the insurance term. The last two terms, \(\int_{-\infty}^{\infty}b f(b) db\) & \(\sum^i p_i b_i\) are the continuous and discrete methods of calculating the expected loss term. Both effectively say that the expected loss associated with a big event (\(E(B)\)) is equal to the cost of that event (\(b\)) multiplied by the probability of it's occurrence (\(p\)). The summation just makes sure we include all relevant big events.
So, for example, if a heart attack cost $10,000, and the probability of an individual having one is 10% over the insurance term, the expected value is $1,000. This expected value is the liability incurred by an insurance provider if they take on this hypothetical individual. To be made whole, the provider must receive a stream of insurance premium payments from the individual that sum to $1,000 over the insurance term.
Everyone has some exposure to the insurance concept, so why bother to break it down? The reason is to highlight what it is not. Insurance is purely a risk transferring operation that attempts to mitigate the cash flow difficulties associated with catastrophic events. Individuals can accommodate small premium payments over time, but if large events come along that exceed the individual's income in a given time period (and available savings), individuals must either incur debt or go without. (Debt can be an efficient way to finance investments, but it becomes problematic when the volume of debt dwarfs the income needed to service it.)
Insurance is related to, but not the same thing as access to care. Access to care means that an individual can actually acquire and consume the health services that a required at any given time. By contracting with an insurance provider, an individual increases the set of available health services that can be financed at any given time. Whether or not the health services are available is a separate concern, and this distinction is critical to the ultimate performance of the ACA.
Perhaps the most well-known fact about health care in the US is that we spend a lot on it, and that amount grows each year. Consider the growth in health care related personal expenditures. Note that this captures only a portion of total health activity in the country.
'''Displaying vincent graphics within the Notebook requires that we first initialize this component of the package'''
vt.core.initialize_notebook()
'''To begin, we will pull just the personal consumption expenditures related to health, and plot them with vincent.
Note that after we create the plot object, we will need to switch the data type of the x-axis to ordinal with the scales
method. This is because we are dealing with temporal information, which vincent does not yet appear to accommodate
seamlessly when working with pandas DataFrames (DFs)'''
#Capture health expenditures
hth_exp=web.get_data_fred('DHLCRC1Q027SBEA','1/1/1970','12/1/2013')
#Create area object
hth_exp_area=vt.Area(hth_exp)
#Create axis labels
hth_exp_area.axis_titles(x='Quarter',y='Health Expenditures ($B)')
#Set number of tick marks
hth_exp_area.axes[0].ticks=10
hth_exp_area.display()
Clearly these expenditures are increasing over time. The relevant question is, however, does this matter? In general, looking at expenditures for health (or any social expenditure item) in a vacuum is of limited value. Our expenditure in health is over seven times as large now as it was 30 years ago. I guess that's passingly interesting, and it certainly mirrors the kind of information often tossed about in the news. But again, so what?
That information does not provide any operational knowledge. We spend more on a lot of things than we did 30 years ago. We cannot make policy base upon that information alone. The relevant question (or one of them anyway) centers on opportunity cost. If we are spending \(x\) on health care, how much do we give up in other areas? This is one of the few cases in which making a comparison to a household budget actually makes sense. If one spent 25% of the household budget on going out to eat last year, and then spent 30% of the budget on the same activity this year, everything else gets squeezed. Just as our household budget is set by the income we pull in, the relevant budget constraint for the country is GDP. Let us reconsider health expenditures as a portion of GDP.
'''We will perform a similar operation here as we did with the health expenditure data above. We are only
inserting the newly calculated fraction as our series.'''
#Capture GDP
gdp=web.get_data_fred('GDP','1/1/1970','12/1/2013')
#Combine health expenditure and GDP information
hth_gdp=DataFrame(hth_exp).join(gdp)
#Rename columns
hth_gdp.columns=['hth','gdp']
#Calculate health expenditure as % of GDP
hth_gdp['hth_gdp']=hth_gdp['hth']/hth_gdp['gdp']
#Create area object
hth_gdp_area=vt.Area(hth_gdp['hth_gdp'])
#Create axis labels
hth_gdp_area.axis_titles(x='Quarter',y='Health Expenditures ($B)')
#Set number of tick marks
hth_gdp_area.axes[0].ticks=10
#Set color of chart
hth_gdp_area.colors(brew='RdBu')
hth_gdp_area.display()
This chart is a bit more alarming. By measuring expenditures relative to income, we see that health expenditures are increasingly crowding out other investments. That being said, the relative flattening of the curve in the last five years has led many to speculate about whether or not the dire fiscal implications of the historical trend are still as likely to come to pass.
Placing US health expenditures in international context is more jarring still. The table below displays total health expenditure (not just personal consumption like above) as a percentage of GDP across OECD countries.
HTML('<iframe'+\
' src=http://www.keepeek.com/Digital-Asset-Management/embed-oecd/social-issues-migration-health/'+\
'total-expenditure-on-health-2013-2_hlthxp-total-table-2013-2-en'+\
' width=1000 height=1200/>')
The US clearly stands out as a very high expenditure country in the area of health, which seems to be quite at odds with our health outcomes. A NIH-convened panel composed of personnel from the the National Research Council and the Institute of Medicine generated a report titled U.S. Health in International Perspective: Shorter Lives, Poorer Health. As may be guess from the title, the findings are suboptimal.
These contextual factors fundamentally influenced the health care reform effort in recent years. Even if health expenditures have slowed, there is ample cause to seek better outcomes and lower cost.
Even though the picture is arguably much worse now than in the past, many analysts saw reason for concern years ago. In fact, although there is a rather meaningless semantic argument over whether or not they were the first to do so, there can be no doubt that the Heritage Foundation (a prominent conservative think tank lobbying against Obamacare with remarkable zeal) strongly supported a plan that mirrors the ACA concept (Butler, 1989). As the report outlines, they saw three fundamental problems with the health care system:
For those with employer-based insurance, benefits are tax-free. Since consumers are shielded from the full cost, they are likely to demand a greater than optimal service volume. In the health insurance market, this also has the effect of throwing a wedge between employer-covered individuals and those who must operate on the individual market. The latter pay for benefits with after-tax money.
To deal with this issue, Heritage proposed to make all health care benefits taxable. Furthermore, a 20 percent tax credit would be provided for all insurance purchases that met basic requirements. An additional tax credit would be offered to offset out-of-pocket expenses. Insofar as this credit increased with the percentage of income allocated to health expenditures, it would be a notably progressive mechanism. The idea was to decouple the pricing scheme from employment, thereby increasing competition across the group and non-group markets.
In particular, individuals that are close to, but not under the welfare threshold are typically not eligible for Medicaid. The tax credits above were also intended to help these individuals who "cannot afford protection." The vehicle for this subsidy was the refundable property of said credits.
Despite our decision to reject this approach in other areas (e.g. auto insurance), we have only just decided that the negative externalities associated with individuals that forgo health insurance coverage is a legitimate social problem.
Heritage argued that an individual mandate was necessary. It would place responsibility for health care coverage on individuals and not businesses. Further, it honors what was referred to as an "implicit contract" between individuals and society insofar as society should feel obligated to assist those who have fallen into dire health circumstances.
How similar could the Heritage Plan and the ACA really be?
The Heritage Plan offered in (Butler, 1989) outlines the following four objectives:
The ACA outlines the following five objectives:
If these goals sound very similar, it's not your imagination. To be sure, there are subtle differences. For example, HP4 seeks to "encourage innovation" while ACA5 seeks to make specific strategic investments in concert with ACA3 which seeks to make health care expenditure more productive. However, to argue that these efforts were not cut from the same cloth is a remarkable stretch.
To borrow from (Gruber,2011), the foundation of the ACA is a "three-legged stool". This approach looks a lot like the Heritage Plan, but there are some important differences which will become clear.
In this context, non-group insurance simply refers to the individual insurance market. The ACA goes beyond the tax treatment of benefits to attack specific attributes of the market that were deemed problematic. "These include outlawing exclusions for pre-existing conditions and other discriminatory practices, guaranteeing access to non-group insurance, and imposing limits on the availability of insurers to charge differential prices by health status."
Indeed, these are all reforms that poll extremely well when not associated directly with the name Obamacare. Unfortunately, many people seem not to realize that they are part of the law.
In addition to these reforms, minimum standards for "essential benefits" and provisions intended to put downward pressure on overhead were also included. These efforts are a bit more controversial. The first is the primary reason why President Obama has been called out for his ill-advised assertion: "If you like it, you can keep it."The plans that have been cancelled in recent months largely did not comply with the minimum standards set by the ACA. In other words, according to the ACA these plans could not provide an adequate level of protection. It should also be noted that the practical consequence of setting a benefit floor is upward pressure on the premiums paid by individuals. Why is this the case? Ceteris paribus, a more robust insurance package (one that covers more services) will cost more than a less robust insurance package. In other words, it's not that the same insurance is more expensive. Rather, the cost is growing because it's a fundamentally different product. Some argue that this represents an unreasonable reduction in consumer choice. Others argue that many of the plans that are now defunct really didn't provide insurance coverage as popularly perceived anyway. Given that the value of purchasing insurance declines with deductible value, it is difficult to argue against the latter (in the case of the very low cost/high deductible plans that have been cancelled). The downward pressure on overhead is not without it's own controversy, to be discussed below.
This component of the reform is taken directly from earlier iterations of health care reforms of this type (including the Heritage Plan). The goal here is to avoid the problem of adverse selection that plagues insurance pools. In a nutshell, insurance providers are able to hold individual risk by pooling these risks together in a way that limits extreme movements in the entire pool. This is a very similar concept to portfolio theory. Adverse selection occurs when low-cost individuals fail to enter the market, leaving no one to offset high-risk individuals. The individual mandate is a long-recognized method of adjusting the microeconomics of the decision to enter the market, primarily targeting healthy (low-cost) individuals.
Clearly this has been a point of serious contention, despite extant parallel practices in the auto market. The constitutionality of this provision is somewhat different than in the case of the auto market (state constitutions retain a broader set of residual powers relative to the federal constitution). However, as the Supreme Court found last year, the penalty associated with violating the individual mandate is basically a tax.
The problem with the individual mandate is that compliance with the law must be achievable. Even with other reforms (or perhaps because of them), premiums will be higher than what some people can reasonably be expected to pay. The ACA attacks this issue in two ways. First, it expands Medicaid to cover all individuals with incomes below 133%. (In 2011, this was an annual income of less than $10,820 for individuals and $22,050 for a family of four.) The Supreme Court found that such expansion would have fiscal implications for states and therefore could not be forced upon them. The decision by each state to expand Medicaid has been quite partisan.
The second method has been to offset the cost directly through tax credits, much like those proposed in the Heritage Plan. The tax credits are designed to cap the percentage of income that qualifying individuals pay for insurance. The cap begins at 3% for those right above 133% of the poverty line, and it increases to 9.5%. Eligibility expires at 400% of the poverty level.
A considerable amount of attention has been paid to the subsidies mentioned above. In general, the concern is subsidies will reduce the marginal propensity to work. It is true that marginal returns to employment are impacted by subsidies because they phase out. There are two major problems with this argument. The first is that the phase out of a subsidy is a different phenomenon from the subsidy itself. The phase out decreases the amount of income retained per marginal dollar earned up to a certain point. If the subsidy were a fixed dollar amount as opposed to being pegged to income, this disincentive vanishes altogether. The great irony here is that the folks arguing against the marginal incentives of subsidies for low-income individuals are very unlikely to make the same argument in favor of tax credits over deductions more generally. (The same marginal incentives that come from pegging insurance subsidies to income operate when pegging other tax incentives to income. This is the reason why people with higher incomes get more of a reduction in tax liability for a given deduction than people who claim the same deduction at a lower income. Tax credits, on the other hand, do not vary with income.)
A second, but related, concern is that subsidies simply make people lazy. The response to this is why should insurance subsidies be treated any differently? In the US we subsidize all kinds of things through the tax code (for better or worse). In fact, these subsidies (a.k.a. tax expenditures) are actually quite massive relative to the size of the budget. Furthermore, they disproportionately favor high-income taxpayers because 1) they are more likely to itemize, 2) (again) deductions are more valuable at higher income ranges, and 3) they are often for activities disproportionately represented at the higher income scales. The Congressional Budget Office crafted an excellent analysis of the size of these subsidies relative to GDP last year.
In general, the use of subsidies (positive or negative) to effect what are deemed to be socially desirable decisions is an extremely old practice. If this practice is a bad one, we should probably focus on shutting down the biggest offenders first (including the exclusion for employer-based insurance coverage).
HTML('<iframe'+\
' src=http://www.cbo.gov/sites/default/files/cbofiles/images/pubs-images/42xxx/tax-expenditures.png'+\
' width=1100 height=820></iframe>')
Most of the media attention has focused on the problem of adverse selection and the budgetary impact of the ACA. We can address both briefly before moving on to conceptual critique offered by the American Enterprise Institute (a prominent conservative think tank that, in my opinion, far outstrips the Heritage Foundation as of late in terms of analytical rigor).
With respect to adverse selection, the concern is that not enough healthy individuals will sign up, instead opting to pay the penalty. Said differently, the penalty is not sufficient to get them to participate in the exchanges. When viewed from this perspective, it becomes clear that this is a calibration issue. It's engineering, not breakthrough research. If the penalty is not optimal, we can adjust it. This has little to do with the abstract concept of a penalty.
The budgetary impact is sometimes difficult to lockdown because wildly different numbers abound. The reason for this is that commentators and Congressional Members are comparing apples and oranges. To provide hard and fast concept names, Republicans tend to talk about the cost of Obamacare, while Democrats talk about the fiscal impact. Make no mistake, both parties confusingly use the words interchangeably. In a sense, they are interchangeable, just not as they are being used here. As we are using them here, cost refers to only the spending side impacts of the ACA, while fiscal impact takes into account revenues as well. These concepts are related in the following straightforward equation:
fiscal impact \(=\) revenues \(-\) cost
When CBO originally scored the ACA, they indicated a net positive fiscal increase (the ACA reduces the deficit over the budget window). While the estimates have fluctuated somewhat, these fluctuations have been minimal. Using the best available information, as a consequence of the revenue provisions embedded in the ACA, CBO still forecasts a net positive fiscal impact of
Dr. Gottlieb of AEI sees the ACA as "a plan at war with itself." (Gottlieb, 2013) For example, as outlined in the ACA5 above, it depends on insurers to make investments while simultaneously limiting the non-service related operating margins available to said insurance providers. These inconsistencies may limit the effectiveness of the law. Gottlieb's real problem, however, lies in the push towards what are known as Accountable Care Organizations. In effect, these ACOs are integrated service delivery environments that seek to reduce the number of providers by absorbing smaller practices.
How does one get smaller practices to join large organizations? The ACA contains provisions that seek to limit reimbursement rates to providers by basically saying, you get a set amount of money to fix this person. If the costs runover, it's on you. This is in contrast to the existing model in which the insurance company is on the hook for all services provided (to the extent they cannot argue their way out of them). The resultant exposure can be too much to absorb for smaller practices. Thus, by shifting risk from insurers to providers, a strong incentive is created to join these ACOs, which have a stronger resource base.
From the perspective of the Obama Administration, this provides an opportunity for cost control by permitting organizational managers to act as budgetary stewards. Gottlieb argues that this is a recipe for fragmented care (due to a transition from family practitioners to "shift" doctors) with a strong incentive to underprovide services. Furthermore, he argues that a similar consolidation effort failed 20 years ago because the cost reductions never materialized. The same is likely to happen now because the downward pressure on cost appears to be uncoupled with clinical outcomes (among other things).
To his credit, Gottlieb acknowledges that the Obama Administration has considered the same issues he fears. They, however, believe that the earlier failures were a result of infrastructure failures that have since been upgraded.
Gottlieb (2013) argues that in addition to the theories outlined above, the ACA was and is motivated by a strong belief that the data suggest efficiencies are there for the taking. We started above with a very high-level view, so now it would be appropriate to explore the current state of insurance coverage and implementation of the ACA.
It is useful to start with a view of the uninsured population proportion by state. (Note that the distortions are recognized, particularly Michigan. However, for quick exploratory plots, these get the message across.)
'''This is the first instance of a geopandas plot in this script. pandas DFs convert quite naturally to geopandas
GeoDataFrames (GDFs). Note the following coding conventions:
party
{0:'Republican',
1:'Democrat'}
reelect
{0:'No',
1:'Yes'}
exchange
{0:'State',
1:'Partnership',
2:'Federal'}
medicaid
{0:'No',
1:'Leaning Toward Not Expanding',
2:'Leaning Toward Expanding',
3:'Yes'}
'''
#Set working directory
workdir='/home/choct155/dissertation/MiscProj/'
#Read in data
aca=pd.read_csv(workdir+'aca_state.csv').set_index('state')
#Read in shape file
states=gp.GeoDataFrame.from_file(workdir+'tl_2013_us_state.shp').set_index('STUSPS')
#Identify states (common elements of aca and states, which exclude territories and DC)
st_set=set(states.index) & set(aca.index)
#Drop Alaska and Hawaii
st_set_sub=[x for x in list(sorted(st_set)) if x not in ['AK','HI']]
#Join on states
st_aca=gp.GeoDataFrame(states.join(aca).ix[st_set_sub])
# HTML(st_aca[st_aca.columns[4:]].head().to_html())
'''We will first plot the uninsured population as a percentage of each state's total population.
Darker colors indicate higher proportions of uninsured'''
#Set plot size
plt.rcParams['figure.figsize']=18,12
#Create plotting object
fig,axes=plt.subplots(1)
#Plot uninsured population by state
st_aca.plot(column='unins',colormap='Reds',axes=axes)
#Get rid of chart junk (axes)
axes.set_axis_off()
#Set title
axes.set_title('Proportion of Uninsured by State (2013)',fontsize=22);
The map reveals some strong regional disparities in insurance coverage. Here are the ten states with the lowest proportions of uninsured residents...
HTML(DataFrame(st_aca['unins'].order()).head(10).to_html())
...and the ten states with the highest proportions.
HTML(DataFrame(st_aca['unins'].order()).tail(10).to_html())
There are a couple interesting things to note here. First, Massachussetts, the recognized test bed for the ACA, has the lowest proportion by far. The second is the distribution of uninsured individuals by party. Here is a map of the governor's party by state in 2013.
#Create plotting object
fig,axes=plt.subplots(1)
#Plot uninsured population by state
st_aca.plot(column='party',colormap='RdBu',axes=axes)
#Get rid of chart junk (axes)
axes.set_axis_off()
#Set title
axes.set_title('Party of Each Governor (2013)',fontsize=22);
Do we see differences in the uninsured across party lines?
#Set plot size
plt.rcParams['figure.figsize']=15,6
#Capture uninsured data by party
REP=st_aca['unins'][st_aca['party']==0].values
DEM=st_aca['unins'][st_aca['party']==1].values
#Generate plot object
fig,axes=plt.subplots(1)
#Plot notched boxplot
axes.boxplot([REP,DEM],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])
#Overlay scatter plot of original data
axes.scatter(st_aca['unins'],st_aca['party'] + np.random.normal(1,.015,len(st_aca['party'])),alpha=.5,s=50)
#Calculate means by party
mean_unins=[REP.mean(),DEM.mean()]
axes.scatter(mean_unins,[1,2],c='r',s=150,alpha=.5,marker='D')
#Set labels
axes.set_yticklabels(['R','D'])
plt.xlabel('Uninsured Proportion of the Population')
plt.ylabel('Party of the Governor')
plt.title('Uninsured Populations by Party of the Governor');
'''We are just calculating a simple t-test of the means here'''
from scipy import stats
#Perform t-test
t=stats.ttest_ind(REP,DEM,equal_var=False)
#Share results
print 'THE p-VALUE OF A t-TEST OF THE MEANS IS ',t[1]
To unpack the diagram a bit, we see boxplots of the uninsured data, split by the party of the Governor. The vertical red line indicates the median, while the red diamond indicates the mean for each group. The blue circles are the individual data points from the original information.
Interestingly enough, while the average uninsured population appears to be greater (with a high degree of confidence) in Republican states, the typical (a.k.a. median) cases are quite similar. One might reasonably ask why we are looking at this in the first place. The reason is to establish whether disparities in the target population impact implementation decisions.
Observe the distribution of implementation efforts by state. For the arrangement of state exchanges, there are three groups (sorry, the colors are a little backwards on this one):
For the decision to expand Medicaid, there are four groups:
To put this in context, Burke & Kamarck (2013) suggests that states that do not implement their own exchange and have opted against expanding Medicaid may be seen as obstructing the implementation of the law.
#Set plot size
plt.rcParams['figure.figsize']=18,20
#Create plotting object
fig,axes=plt.subplots(2)
#Plot uninsured population by state
st_aca.plot(column='exchange',colormap='RdBu',axes=axes[0])
st_aca.plot(column='medicaid',colormap='RdBu',axes=axes[1])
#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
#Set title
axes[0].set_title('Arrangement of State Exchange',fontsize=22)
axes[1].set_title('Decision to Expand Medicaid',fontsize=22);
What we notice here is a strong partisan divide. This divide is far from surprising, but boxplots can again help us see how wide the gap is with respect to these provisions.
#Set plot size
plt.rcParams['figure.figsize']=15,12
#Capture data by party
REP_ex=st_aca['exchange'][(st_aca['exchange'].notnull()) & (st_aca['party']==0)].values
DEM_ex=st_aca['exchange'][(st_aca['exchange'].notnull()) & (st_aca['party']==1)].values
REP_med=st_aca['medicaid'][(st_aca['medicaid'].notnull()) & (st_aca['party']==0)].values
DEM_med=st_aca['medicaid'][(st_aca['medicaid'].notnull()) & (st_aca['party']==1)].values
#Generate plot object
fig,axes=plt.subplots(2)
#Plot notched boxplot
axes[0].boxplot([REP_ex,DEM_ex],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])
axes[1].boxplot([REP_med,DEM_med],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])
#Overlay scatter plot of original data
axes[0].scatter(st_aca['exchange'],st_aca['party'] + np.random.normal(1,.025,len(st_aca['party'])),alpha=.5,s=50)
axes[1].scatter(st_aca['medicaid'],st_aca['party'] + np.random.normal(1,.025,len(st_aca['party'])),alpha=.5,s=50)
#Calculate means by party
mean_ex=[REP_ex.mean(),DEM_ex.mean()]
axes[0].scatter(mean_ex,[1,2],c='r',s=150,alpha=.5,marker='D')
mean_med=[REP_med.mean(),DEM_med.mean()]
axes[1].scatter(mean_med,[1,2],c='r',s=150,alpha=.5,marker='D')
#Set labels
axes[0].set_yticklabels(['R','D'])
axes[1].set_yticklabels(['R','D'])
# axes[0].xlabel('Uninsured Proportion of the Population')
axes[0].set_ylabel('Party of the Governor')
axes[1].set_ylabel('Party of the Governor')
axes[0].set_title('Arrangement of State Exchange')
axes[1].set_title('Decision to Expand Medicaid');
Unlike the comparison across parties for the uninsured population, we see stark differences for both the mean and median cases in implementation. Nearly all Republican states have elected not to set up their own exchange while most Democratic states have done so. Further, nearly all Democratic states have elected to expand Medicaid whil most Republican states either chose not to do so, or are leaning in that direction.
This is not the kind of split one expects to see if each decision were based simply on the empirical needs of each respective state. Again, this is not surprising, but it does throw into relief the nature of the debate, and the remarkable disconnect between the merits of a given policy and the likelihood that it will be pursued.
Contemporary efforts at health reform universally feature fiscal implications as a key (if not the primary) motivation. As pointed out above, we are justifiably concerned that health expenditures will eat into our ability to advance other policy goals. The ACA, somewhat paradoxically, actually attempts to increase the amount of health services that can be consumed for millions of people. It is not unreasonable to suspect that the reduction in expenditures that may materialize due to taxes on "cadillac" health plans will not fully offset the expenditure increase that is required to get currently uninsured individuals the statutorily defined minimum level of services required of ACA-type insurance plans. One might ask, if we want to lower health expenditures, why are we trying to by more health services?
This is a complex question with many moving parts. We will not go into detail here, but it suffices to say that the ACA assumes that we can lower the cost of health care provision. We can "bend the cost curve" as it were. The implicit assumption here is that inefficiencies exist that can be addressed with clever policy. There are at least three types of inefficiencies that the ACA targets.
Emergency facility care is very expensive. The designers of the ACA believe that the demand for emergency care can be reduced by providing regular health access and preventative care. The idea here is that if given the choice, people will (on average) address health concerns before they require a trip to the emergency room.
The distributed "fee for service" model is overly costly because doctors are incentivized to provide expensive, but unnecessary tests. Furthermore, return visits for patients are costly, and the current reimbursement scheme does not incentivize "getting it right the first time through."
Lack of transparency and competition in the market create disparities that allow market prices to vary in a manner largely uncoupled from the actual production costs for health services.
I do not currently have data that would facilitate exploration of the first two considerations, but we can take a peek at Medicare Provider Charge Data to gain partial insight into the last one.
'''Here we can read in and briefly examine the data provided by CMS'''
#Establish data location
data_dir='/home/choct155/dissertation/MiscData/'
#Read in data
inpat=pd.read_csv(data_dir+'Medicare_Provider_Charge_Inpatient_DRG100_FY2011.csv')
outpat=pd.read_csv(data_dir+'Medicare_Provider_Charge_Outpatient_APC30_CY2011_v2.csv')
print inpat.head()
print outpat.head()
In the interests of full disclosure, there are a couple items to note:
This is a reasonably rich data set. We will only briefly consider some exploratory views of the data, but it would not be reasonable to assume a high degree of inferential confidence without stricter construction of controls. In other words, the view we see here would be suggestive of what is actually going on, but should not be taken as gospel. This is meant to be just a taste, and not an exhaustive review of even the range of descriptive views of the data.
Gottlieb (2013) appears to have external validity concerns with this data, which is not altogether unreasonable from what I can tell. However, we often work with limited data, and as far as I know this is the best publicly available data of its kind. I am more than happy to accommodate different sources if they should arise. In general, cautiously used data with limitations is far superior than no data at all.
So, what are we looking for here? In a nutshell, we are looking for variation in the cost of the same procedures. If competition and transparency are deficient in the health services market, we would not expect to see convergence to common prices. In reality, some variation will occur for reasons other than the basic cost of service provision. For example, we will see differences across geographic space due to real estate differentials that in turn create variation in overhead costs.
In any case, a good place to start would be to provide minimal information on the data. There are two separate datasets, one for inpatient charges, and one for outpatient charges. Here are the available variables.
print '***INPATIENT***\n',inpat
print '\n***OUTPATIENT\n',outpat
As can be seen, there are 163,065 records for the inpatient set and 43,372 records for the outpatient set. It appears that each record couples a given procedure with a provider. Each procedure/provider combination is coupled with average charge information and some attribute data about the provider among other things.
How many procedures are represented in each set?
print 'THERE ARE ',len(set(inpat['DRG Definition'])),' PROCEDURES IN THE INPATIENT SET'
print 'THERE ARE ',len(set(outpat['APC'])),' PROCEDURES IN THE OUTPATIENT SET'
How many providers are represented?
print 'THERE ARE ',len(set(inpat['Provider Id'])),' PROVIDERS IN THE INPATIENT SET'
print 'THERE ARE ',len(set(outpat['Provider Id'])),' PROVIDERS IN THE OUTPATIENT SET'
A few thousand providers will give us a nice distributional view of charges. We should be careful to consider the spatial distribution of these records (procedure/provider combinations). Are we heavily weighted in one region of the country versus another? The data all come with a zip code attribute, so we can use a Census shapefile that has better resolution than states alone.
'''The zip code shapefile is large, so we will load the data in a standalone cell'''
zip_shp=gp.GeoDataFrame.from_file(data_dir+'tl_2013_us_zcta510.shp')
'''We now need to join the Medicare data to the zip code shapefile. We will effectively be creating a spatial
histogram, so we first need to group to zip codes, and extract the count. This summary DF will then be joined to
the zip code shapefile.'''
#Create new integer version of zip codes in zip GDF
zip_shp['zip_int']=[int(x) for x in zip_shp['ZCTA5CE10']]
#Set the index
zip_shp2=zip_shp.set_index('zip_int')
#Define mask to exclude Alaska and Hawaii
in_ak_hi_mask=-inpat['Provider State'].isin(['AK','HI'])
out_ak_hi_mask=-outpat['Provider State'].isin(['AK','HI'])
#Groupby zip code and count
in_zip=inpat[in_ak_hi_mask].groupby('Provider Zip Code').count()
out_zip=outpat[out_ak_hi_mask].groupby('Provider Zip Code').count()
#Join Medicare data
in_shp=gp.GeoDataFrame(zip_shp2.join(in_zip))
out_shp=gp.GeoDataFrame(zip_shp2.join(out_zip))
#Report proportion lost in the join
in_zip_loss=len(set(in_zip.index)-set(zip_shp2.index))/float(len(set(in_zip.index)))
out_zip_loss=len(set(out_zip.index)-set(zip_shp2.index))/float(len(set(out_zip.index)))
print 'THE INPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES:', in_zip_loss
print 'THE OUTPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES:', out_zip_loss
'''We can now plot the zip code level inpatient data...'''
#Set plot size
plt.rcParams['figure.figsize']=20,12
#Set line thickness
plt.rcParams['lines.linewidth']=.001
#Create plotting object
fig,axes=plt.subplots(1)
#Plot inpatient record coverage by zip code
in_shp[in_shp['Provider Id'].notnull()].plot(column='Provider Id',colormap='RdBu',axes=axes)
#Get rid of chart junk (axes)
axes.set_axis_off()
#Set title
axes.set_title('Inpatient Record Coverage',fontsize=22);
'''...followed by the zip code level outpatient data'''
#Set plot size
plt.rcParams['figure.figsize']=18,11
#Set line thickness
plt.rcParams['lines.linewidth']=.001
#Create plotting object
fig,axes=plt.subplots(1)
#Plot outpatient record coverage by zip code
out_shp[out_shp['Provider Id'].notnull()].plot(column='Provider Id',colormap='RdBu',axes=axes)
#Get rid of chart junk (axes)
axes.set_axis_off()
#Set title
axes.set_title('Outpatient Record Coverage',fontsize=22);
Hexbin maps probably would be a bit cleaner for showing this coverage, but the labor-return ratio just isn't quite there for this purpose. In any case, it appears the zip code distribution is quite similar (if not identical) across the inpatient and outpatient sources. There also appears to be a strong eastward geographic bias in the sample. This isn't the end of the world, but certainly something to keep in mind.
It would be nice to get a clearer view of the distribution of records per zip code. We filtered Alaska and Hawaii out of the histogram maps above because their distances from "lower 48" had visual proportional implications that limited the usefulness of the maps. In other words, the "lower 48" became too small to see much. We do not need to filter these states out when viewing the data non-spatially.
'''We are going to recreate the groupby objects (counting records within zip codes) without filtering Alaska
and Hawaii. Then we can view the distributional shape of records per zip code.'''
#Groupby zip code and count
in_zip2=inpat.groupby('Provider Zip Code').count()
out_zip2=outpat.groupby('Provider Zip Code').count()
#Set plot size
plt.rcParams['figure.figsize']=15,8
#Set color palette
c1,c2,c3,c4,c5=seaborn.color_palette('Set1',5)
#Generate plot object
fig,axes=plt.subplots(1)
#Plot kernel density of input and outpatient data
seaborn.kdeplot(in_zip2['Provider Id'].astype(float).values,shade=True,color=c4,ax=axes,label='Inpatient')
seaborn.kdeplot(out_zip2['Provider Id'].astype(float).values,shade=True,color=c5,ax=axes,label='Outpatient')
#Set title
axes.set_title('Distribution of Records per Zip Code',fontsize=22)
#Set facecolor
axes.patch.set_facecolor('white')
While it is not surprising that we should have many more zip codes in the inpatient with many records (there are more inpatient records to begin with), it is interesting that the inpatient distribution is so much flatter. It suggests a much heavier influence of dominant zip codes in the inpatient data. We could explore these properties a bit further, but for the purpose of this Notebook, we have a sense of things to consider when evaluating the distribution of charges.
As noted above, there are 100 procedures in the inpatient set and 30 procedures in the outpatient set. We will not explore all of them here, but if specific inquiries are made, we could conceivably look into them further. In the interests of time, we will focus on four cases:
The idea is to capture variation in categorically different locations in the frequency distribution. Hopefully this will give a more representative picture than just focusing on the most common procedure. The first step is identifying the procedures that fit in each of these buckets.
'''To identify the procedures at each of these distributional positions, we need value counts for all procedures.
We can then rely on the pandas built-in describe function to provide the relevant counts, and select the procedures
associated with said counts. Note that the values returned by describe may not correspond with actual values in
the value counts. When this occurs, we just use the closest value.'''
#Count instances of each procedure
proc_counts=inpat['DRG Definition'].value_counts()
#Identify counts at each distributional position
max_c=proc_counts.describe()['max']
x75=proc_counts.describe()['75%']
x50=proc_counts.describe()['50%']
x25=proc_counts.describe()['25%']
#Define function to find the closest actual value in proc_counts
def find_nearest(seq,val):
idx=(np.abs(seq-val)).argmin()
return seq[idx]
#Identify associated procedure
proc_max=proc_counts[proc_counts==find_nearest(proc_counts.values,max_c)].index[0]
proc_75=proc_counts[proc_counts==find_nearest(proc_counts.values,x75)].index[0]
proc_50=proc_counts[proc_counts==find_nearest(proc_counts.values,x50)].index[0]
proc_25=proc_counts[proc_counts==find_nearest(proc_counts.values,x25)].index[0]
print '***PROCEDURES AT PRESCRIBED DISTRIBUTIONAL POSITIONS***\n'
print 'MOST COMMON:',proc_max
print '75%:',proc_75
print '50%:',proc_50
print '25%:',proc_25
Now we can subset by each of these procedures and check out the variation in charges for each.
'''We are just subsetting the inpatient data by procedure here'''
#Subset by each of the procedures
in_max=inpat[inpat['DRG Definition']==proc_max]
in_75=inpat[inpat['DRG Definition']==proc_75]
in_50=inpat[inpat['DRG Definition']==proc_50]
in_25=inpat[inpat['DRG Definition']==proc_25]
in_max.head()
There are three important data definitions that can be found here, but will be reiterated anyway:
We will ask three basic questions, each of which could be explored much farther than will occur here.
The first is simple enough to capture with kernel density plots.
'''We are just making more density plots as we did above'''
#Set plot size
plt.rcParams['figure.figsize']=15,16
#Generate plot object
fig,axes=plt.subplots(2)
#Plot kernel density of input and outpatient data
seaborn.kdeplot(in_max[' Average Covered Charges '].astype(float).values,shade=True,color=c1,ax=axes[0],label='Max')
seaborn.kdeplot(in_75[' Average Covered Charges '].astype(float).values,shade=True,color=c2,ax=axes[0],label='75%')
seaborn.kdeplot(in_50[' Average Covered Charges '].astype(float).values,shade=True,color=c3,ax=axes[0],label='50%')
seaborn.kdeplot(in_25[' Average Covered Charges '].astype(float).values,shade=True,color=c4,ax=axes[0],label='25%')
seaborn.kdeplot(in_max[' Average Total Payments '].astype(float).values,shade=True,color=c1,ax=axes[1],label='Max')
seaborn.kdeplot(in_75[' Average Total Payments '].astype(float).values,shade=True,color=c2,ax=axes[1],label='75%')
seaborn.kdeplot(in_50[' Average Total Payments '].astype(float).values,shade=True,color=c3,ax=axes[1],label='50%')
seaborn.kdeplot(in_25[' Average Total Payments '].astype(float).values,shade=True,color=c4,ax=axes[1],label='25%')
#Set title
axes[0].set_title('Distribution of Average Covered Charges by Procedure',fontsize=22)
axes[1].set_title('Distribution of Average Payments by Procedure',fontsize=22)
#Set facecolor
axes[0].patch.set_facecolor('white')
axes[1].patch.set_facecolor('white')
So why are we looking at the total distribution of charges and payments? It gives us a baseline from which to judge variation across and within states. We also have a sense of which procedures may have more inherent variation. The interesting thing to note in the plots above is that variation does not appear to be driven entirely by the frequency of the procedure. We would expect that with increasing frequency the procedure would become more standardized with respect to protocol and price. In fact, the least frequent procedure in the group has the tightest distribution. In other words, it varies in cost the least. Further, it is interesting to note that the most variation comes from the procedure in the 75 percentile position.
This is clearly a high-level observation and would need further exploration to verify, but it is quite suggestive. What about variation in average charges across states? To evaluate this, we need a measure of the average charges and payments by state. The charge and payment data are pegged to the discharges, so we can use these as weights in the construction of a weighted average for each state.
'''To construct the weighted averages of charges and payments by state, we will utilize the trusty 'split-apply-combine'
approach. (See Hadley Wickham's plyr package if you are unfamiliar:
http://www.r-bloggers.com/a-fast-intro-to-plyr-for-r/ )
We will split each DF by state, calculate the weighted sums using discharges as weights for both payments and charges,
recombine the states into a single DF, and join it with our state shapefile.'''
def wt_avg(df):
#Generate list of states
st_list=sorted(set(df['Provider State']))
#Generate lists for average data
charge_list=[]
pay_list=[]
#For each state in the procedure specific subset...
for st in st_list:
#...subset by the state...
st_sub=in_max[in_max['Provider State']==st]
#...calculate total charges and payments...
st_sub['tot_charge']=st_sub[' Total Discharges ']*st_sub[' Average Covered Charges ']
st_sub['tot_pay']=st_sub[' Total Discharges ']*st_sub[' Average Total Payments ']
#...divide total charges and payments by total discharges...
avg_charge=st_sub['tot_charge'].sum()/st_sub[' Total Discharges '].sum()
avg_pay=st_sub['tot_pay'].sum()/st_sub[' Total Discharges '].sum()
#...and throw the resultant values in a list
charge_list.append(avg_charge)
pay_list.append(avg_pay)
#Construct a DF from the output lists
avg_df=DataFrame({'state':st_list,
'charges':charge_list,
'payments':pay_list}).set_index('state')
return avg_df
#Join each summary DF with state shapefile
imax_geo=gp.GeoDataFrame(states.join(wt_avg(in_max)))
i75_geo=gp.GeoDataFrame(states.join(wt_avg(in_75)))
i50_geo=gp.GeoDataFrame(states.join(wt_avg(in_50)))
i25_geo=gp.GeoDataFrame(states.join(wt_avg(in_25)))
We need to see this data in both spatial and non-spatial formats. First the charges...
#Set plot size
plt.rcParams['figure.figsize']=18,54
#Create plotting object
fig,axes=plt.subplots(5)
#Plot covered charges by state
gp.GeoDataFrame(imax_geo.ix[st_set_sub]).plot(column='charges',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_geo.ix[st_set_sub]).plot(column='charges',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_geo.ix[st_set_sub]).plot(column='charges',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_geo.ix[st_set_sub]).plot(column='charges',colormap='Purples',axes=axes[3])
seaborn.kdeplot(imax_geo['charges'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_geo['charges'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_geo['charges'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_geo['charges'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')
#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()
#Set title
axes[0].set_title('Charges by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Charges by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Charges by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Charges by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Charges by State',fontsize=22)
#Set facecolor
axes[4].patch.set_facecolor('white')
...and now the payments data.
#Set plot size
plt.rcParams['figure.figsize']=18,54
#Create plotting object
fig,axes=plt.subplots(5)
#Plot covered charges by state
gp.GeoDataFrame(imax_geo.ix[st_set_sub]).plot(column='payments',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_geo.ix[st_set_sub]).plot(column='payments',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_geo.ix[st_set_sub]).plot(column='payments',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_geo.ix[st_set_sub]).plot(column='payments',colormap='Purples',axes=axes[3])
seaborn.kdeplot(imax_geo['payments'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_geo['payments'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_geo['payments'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_geo['payments'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')
#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()
#Set title
axes[0].set_title('Payments by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Payments by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Payments by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Payments by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Payments by State',fontsize=22)
#Set facecolor
axes[4].patch.set_facecolor('white')
Wow. There is considerable variation across states, but almost no variation across procedures. This suggests that 1) there are strong state-specific factors heavily influencing the payment structure in each state (perhaps an expected finding), and 2) (again) frequency appears to play an insignificant role in the distribution of pricing. Both of these findings are consistent with the idea that substantial barriers to efficient market clearing exist.
If these barriers do exist, the ACA focus on provider transparency and exchange-based competition could bear real fruit. It would also suggest that the Republican proposal to enable the purchase of insurance across state borders could also be quite productive.
What about variation within states? First the charge data...
'''We will use a simple boxplot to capture differences by state. Coloring by party of the Governor would also be
useful to capture any partisan differences that may exist. To do this, we will join in party information, map party
colors, and plot the results.
Note that we also care about the order of the states. If they are arranged by average value, the information is
easier to absorb, and we may better pick up any trends that exist.
'''
#Join party information in with cost data
inpat_col=inpat.set_index('Provider State').join(aca['party'])
#Capture order of states (by mean charge)
order1=list(inpat_col[' Average Covered Charges '].groupby(level=0).mean().order().index)
#Map colors to each state
color_map={0:'#DE2D26',
1:'#3182BD'}
inpat_col['color']=inpat_col['party'].map(color_map)
#Capture a smaller list with one record for each state (and the associated color)
col_by_state1=inpat_col.groupby(level=0).last()['color'].fillna('#CCCCCC').ix[order1]
#Generate a Seaborn color palette
party_colors1=seaborn.color_palette(list(col_by_state1.values),len(col_by_state1))
#Set plot size
plt.rcParams['figure.figsize']=18,25
#Generate plot object
fig,axes=plt.subplots(1)
#Plot variation in total charges by state
seaborn.boxplot(inpat_col[' Average Covered Charges '],inpat_col.index,color=party_colors1,order=order1,vert=False,ax=axes)
#Set title
axes.set_title('Variation in Charges by State',fontsize=22)
#Set facecolor
axes.patch.set_facecolor('white')
...and then the payments.
#Capture order of states (by mean charge)
order2=list(inpat_col[' Average Total Payments '].groupby(level=0).mean().order().index)
#Capture a smaller list with one record for each state (and the associated color)
col_by_state2=inpat_col.groupby(level=0).last()['color'].fillna('#CCCCCC').ix[order2]
#Generate a Seaborn color palette
party_colors2=seaborn.color_palette(list(col_by_state2.values),len(col_by_state2))
#Set plot size
plt.rcParams['figure.figsize']=18,25
#Generate plot object
fig,axes=plt.subplots(1)
#Plot variation in total charges by state
seaborn.boxplot(inpat_col[' Average Total Payments '],inpat_col.index,color=party_colors2,order=order2,vert=False,ax=axes)
#Set title
axes.set_title('Variation in Payments by State',fontsize=22)
#Set facecolor
axes.patch.set_facecolor('white')
These high-level views provide a sense of the baseline variation in each state. As can be seen, there are significant differences across states. Correlations with party appear to be more meaningful with variation in payments versus variation in charges. For some reason, Democratic states appear to have higher variance in their payment structure.
What we really want to see, however, is variance within procedure. We can use the coefficient of variation as a nice summary measure. As a consequence of normalizing variation by the mean, it facilitates comparisons across groups.
'''We will again rely on split-apply-combine to find the CoV within each state. We will split the data by state,
divide the charge and payment standard deviations by their respective means, and recombine the data for plotting.'''
def CoV(df):
#Generate list of states
st_list=sorted(set(df['Provider State']))
#Generate lists for average data
charge_list=[]
pay_list=[]
#For each state in the procedure specific subset...
for st in st_list:
#...subset by the state...
st_sub=in_max[in_max['Provider State']==st]
#...calculate total charge and payment means...
charge_mean=st_sub[' Average Covered Charges '].mean()
pay_mean=st_sub[' Average Total Payments '].mean()
#...calculate total charge and payment standard deviations...
charge_std=st_sub[' Average Covered Charges '].std()
pay_std=st_sub[' Average Total Payments '].std()
#...divide charge and payment standard deviations by their means...
cov_charge=charge_std/charge_mean
cov_pay=pay_mean/pay_std
#...and throw the resultant values in a list
charge_list.append(cov_charge)
pay_list.append(cov_pay)
#Construct a DF from the output lists
cov_df=DataFrame({'state':st_list,
'charges':charge_list,
'payments':pay_list}).set_index('state')
return cov_df
#Join each summary DF with state shapefile
imax_cov_geo=gp.GeoDataFrame(states.join(CoV(in_max)))
i75_cov_geo=gp.GeoDataFrame(states.join(CoV(in_75)))
i50_cov_geo=gp.GeoDataFrame(states.join(CoV(in_50)))
i25_cov_geo=gp.GeoDataFrame(states.join(CoV(in_25)))
Now we can once again plot the spatial and non-spatial distributions.
#Set plot size
plt.rcParams['figure.figsize']=18,54
#Create plotting object
fig,axes=plt.subplots(5)
#Plot covered charges by state
gp.GeoDataFrame(imax_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Purples',axes=axes[3])
seaborn.kdeplot(imax_cov_geo['charges'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_cov_geo['charges'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_cov_geo['charges'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_cov_geo['charges'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')
#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()
#Set title
axes[0].set_title('Variation in Charges by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Variation in Charges by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Variation in Charges by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Variation in Charges by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Charge CoV by State',fontsize=22)
#Set facecolor
axes[4].patch.set_facecolor('white')
#Set plot size
plt.rcParams['figure.figsize']=18,54
#Create plotting object
fig,axes=plt.subplots(5)
#Plot covered charges by state
gp.GeoDataFrame(imax_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Purples',axes=axes[3])
seaborn.kdeplot(imax_cov_geo['payments'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_cov_geo['payments'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_cov_geo['payments'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_cov_geo['payments'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')
#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()
#Set title
axes[0].set_title('Variation in Payments by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Variation in Payments by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Variation in Payments by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Variation in Payments by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Payment CoV by State',fontsize=22)
#Set facecolor
axes[4].patch.set_facecolor('white')
We see very similar patterns to the variation across states in the sense that frequency of provision for a given procedure does not seem to be a factor. One the other hand, there are large variations in the consistency of charges and payment within state. This is only modestly true for charges, but payments exhibit remarkable variation in this regard.
The ACA is predicated on the idea that there are market inefficiencies that can be addressed that would increase the productivity of health expenditure. This has to happen to make expanding health care to more citizens a feasible prospect. The law seeks to do this by providing incentives that push health care providers into large integrated environments called Accountable Care Organizations. These incentives are passed largely through the tax codes, in a manner consistent with our country's pursuit of many unrelated policy goals.
There does not appear to be any a priori reason why this cannot yield some benefits. Gottlieb's (2013) concerns about historical efforts along similar veins may be somewhat tempered by updated infrastructure in the health services and related industries. Further, the experience of Massachusetts suggests that this type of reform has the potential to achieve stated policy goals (expanded insurance coverage, and greater utilization of health resources) while not imploding the private market for insurance (Gruber 2011).
All that being said, the ACA has a much more complex task than that which was undertaken in Massachusetts because it must coordinate the disparate interests and levels of implementation effort of the various states. Indeed, some states have vowed to block the ACA even after the Supreme Court put to rest any questions of constitutionality. Getting state systems to talk to one another is challenging even in the best of times.
The potential for benefit from a reshaping of the health market appears to exist. Data limitations notwithstanding, a case can convincingly be made that our current structure impedes the properties assumed to exist in efficient markets. Furthermore, it seems clear that the status quo is not a sustainable option. Ultimately, what is needed is a way to cut through the nonsense. The ACA is well within the legal authority of Congress, so we should focus on whether or not its policy that actually helps the population. We need an objective metric by which to assess the value added by the ACA, and Burke & Kamarck (2013) provide just that. The following is their list of measures we might use to assess performance. We may find them to be incomplete or in need of modification as time goes on, but they are as good an effort as any to evaluate the policy from a dispassionate perspective.
I would also add a sort of meta-query: to what extent are we ok with tradeoffs in these goals? The paper expands on each of these, but the idea behind the list comes through readily.
This Notebook has just scratched the surface, but with any luck it can provide a reasonable frame for thinking about the Affordable Care Act, and incite additional questions.